Clustering of random scale-free networks 
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We derive the finite size dependence of the clustering coefficient of scale-free random graphs gener- 
ated by the configuration model with degree distribution exponent 2 < 7 < 3. Degree heterogeneity 
increases the presence of triangles in the network up to levels that compare to those found in many 
real networks even for extremely large nets. We also find that for values of 7 w 2, clustering is 
virtually size independent and, at the same time, becomes a de facto non self-averaging topological 
property. This implies that a single instance network is not representative of the ensemble even for 
very large network sizes. 
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Null models are critical to gauge the effect that ran- 
domness may have on the properties of systems in the 
presence of noise. It is therefore important to have 
the maximum understanding of the null model at hand, 
something not always easy to achieve. This is the case of 
the most used null model of random graphs, the config- 
uration model (CM) [T]-[4] 

Given a real network, the configuration model pre- 
serves the degree distribution of the real network, P(k), 
whereas connections among nodes are realized in the 
most random way, always preserving the degree sequence 
-either the real one or drawn from the distribution P(k)- 
and avoiding multiple and self-connections. In principle, 
the CM generates graphs without any type of correla- 
tions among nodes. For this reason, it is widely used in 
network theory to determine whether the observed topo- 
logical properties of the real network might be considered 
as the product of some non trivial principle shaping the 
evolution of the system. 

This program is severely hindered when the network 
contains nodes with degrees above the structural cut-off 
k s = \J (k)N 5], where (k) is the average degree and 
N the size of the network. This is the case of scale- 
free networks with P(k) ~ fc -7 , 7 < 3, and a natural 
cut-off k c ~ TV 1 /' 7-1 ) most often found in real complex 
networks [5J. This apparently simple null model devel- 
ops all sort of anomalous behaviors in this case, e. g., 
the appearance of strong non-trivial degree correlations 
among nodes [THS] , difficulties in the sampling of the 
configuration space 10J, or the presence of phase tran- 
sitions between graphical and non-graphical phases 
to name just a few. 

Clustering -or the presence of triangles in the network- 
is yet another example of anomalous behavior associated 
to the CM. The importance of clustering as a topological 
property is related to the fact that nearly all known real 
complex networks have a very large number of triangles 
whereas the CM has a vanishingly small number in the 
thermodynamic limit. Of course, the absence of triangles 
is convenient from a theoretical point of view as it allows 
us to use generating functions techniques to solve many 
interesting problems [6j. However, given the empirical 
observations, it seems to be a quite unrealistic assump- 



tion. This has led to the common understanding that 
clustering observed in real networks cannot be explained 
by the CM and, thus, is the product of some underlying 
principle. While we fully agree with this statement, in 
this paper, we show that it must be taken with care. In- 
deed, depending on the heterogeneity of P(k), the CM 
can generate, on average, nearly size-independent levels 
of clustering. Besides, in such cases, sample-to-sample 
fluctuations do not vanish when N — > 00, meaning that 
the same degree sequence may generate either very high 
or very low levels of clustering, independently of the net- 
work size. 

Clustering can be quantified using different met- 
rics [T2]. Here, we use the average clustering coeffi- 
cient C, defined as the average (over nodes of degree 
k > 2) of the local clustering coefficient of single nodes 
Cj = 2Ti/ki(ki — 1), with Tj the number of triangles at- 
tached to node i. In the absence of high degree nodes, 
the clustering coefficient of a random graph generated by 
the CM is given by [13] 
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and, therefore, vanishes very fast in the large system size. 
This is the reason why the tree-like character of net- 
works generated by the CM has always been taken for 
granted. However, Eq. is clearly incorrect when the 
degree distribution is scale-free, as it predicts a behavior 
C ~ nV-^Wsi-^ that diverges for 7 < 7/3. Equa- 
tion ([lj fails in this case because its derivation does not 
account for the structural correlations among degrees of 
connected nodes. In this paper, we derive the correct 
scaling behavior of the clustering coefficient for scale-free 
random graphs with 2 < 7 < 3. 

The CM, as originally defined, defines a micro- 
canonical ensemble, in the sense that the degree of ev- 
ery single node is given a priori and, once the degree 
sequence is fully known, the network is assembled in the 
most random way while preserving the degree sequence. 
However, in the case of scale-free networks, this approach 
resists any analytic treatment. Instead, here we adopt a 
different strategy and work with the canonical ensemble 
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of the CM. In this ensemble, each node is given not its 
actual degree but its expected degree. This relaxes the 
topological conditions to close the network and opens the 
door to an analytic treatment. Specifically, the model is 
defined as follows 

1. Each node is assigned a hidden variable k drawn 
from the probability density p(n) =oc k -7 with 
1 < k < k c . The cut-off value k c is, in principle, 
arbitrary. However, often k c is the so-called natu- 
ral cut-off, defined as the expected maximum value 
out of a sample of N random deviates given from 
the probability density p{n). In the case of inter- 
est of a scale-free distribution, the natural cut-off 
scales as k c ~ iV 1 ^ 7-1 ). 

2. Each pair of nodes is visited once and connected 
with probability 




where k and k' are t he hidden variables associated 
to each node, k s = J ( 7 ^2)2 ~ » an< ^ ^min is the ex- 
pected minimum degree of the network. The par- 
ticular form chosen for the connection probability 
ensures that the entropy of the ensemble is maxi- 
mal [HHTB]. 

It can be shown that the average degree of a node with 
hidden variable k is k(n) oc k [HI El HE]- Thus, we can 
think of k and p(n) as the degree and degree distribution, 
respectively. 

Parameter structural cut-off defining the on- 

set of structural correlations, that is, nodes with ex- 
pected degrees below k s are connected with probability 



r(^fr) « and, therefore, are uncorrelated at the level 
of degrees. As a consequence, the global level of corre- 
lations present in the system is controlled by the cut-off 
k c . Whenever k c < k s the resulting network is fully un- 
correlated whereas for k c > k s correlations are necessary 
to close it. In this paper, we are interested in the range 

Using the formalism developed in[T7], the local clus- 
tering coefficient of a node with hidden variable n can be 
written as 




(3) 

The average clustering coefficient is computed from c(k) 
as C — J p(k)c(k)cIk. However, since c(/t) is a bounded 
monotonously decreasing function its major contribution 
to C comes from nodes with small degree, i. e., low 
k. Therefore, to find the correct scaling behavior it suf- 
fices to evaluate c(k) in the domain k << k s . In this 
case, the maximum value within the domain of integra- 
tion \1/k s ,k c /k s ] of the arguments kx/k s and kx/k s in 
Eq. (13| is of order 0(k c /k 2 ), which goes to zero in the 
thermodynamic limit. We can, thus, approach c(k) as 

which becomes independent of k. After some manipula- 
tion, this expression becomes 
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where 

^(7) = $(-1,1,3 -7) + $(-1,1,7 -2), 

0(7) = — 7T 2 COt 7T7 CSC 7T7, 

and $(z, a,b) is the transcendent Lerch function [19]. 
This expression, although involved at first glance, it is 
convenient because in the range k s < k c <C k 2 . the ar- 
guments of the three transcendent Lerch functions in it 



go to 0~ in the limit k s — > 00, in which case we know 
that $(-z 2 ,a, b) ~ b~ a for z ->• 0. We then find the 
asymptotic behavior 

f 0(7) +$(-1,2,7 -2) k c = Ks >1 



^ ' 2(7-2) I / x 

Ks [2^(7) In f^J k c >k s >1. 

(6) 

The first line in this equation recovers the result found 



3 



ioV 




10" 



10 10 10 
ngtworK sige, fl 



Simulations y=2.1 
Simulations y=2.3 
Simulations y=2.5 
Simulations y=2.7 
Simulations y=2.9 



10" 



10 



10" 



network size, N 



FIG. 1: Clustering coefficient as measured in numerical sim- 
ulations for different values of 7 and size with k m in = 2 
and k c = N 1 ^ < ~'~ 1 K Each point is an average over 10 4 dif- 
ferent network realizations. Dashed lines are the numerical 
solution of Eq. Q and solid lines are the approximate solu- 
tion given by Eq. |5j. The inset shows an extrapolation up 
to size N = 10 8 using Eq. fHl. 




FIG. 2: Clustering coefficient as a function of 7 for different 
network sizes. Curves are evaluated from Eq. (J5J) with k min = 
2 and k c = AtVCt-i). 



in [3] for scale-free networks without structural correla- 
tions -c(k) ~ N 2 ^ 1 when k s ~ N 1 / 2 - whereas the sec- 
ond line predicts c(«) ~ N 2 ^ 1 In AT when n c ~ AT 1 ^ 7-1 ), 
which corrects the incorrect scaling behavior predicted 
by Eq. Qin this case. 

Figure 111 shows a comparison between numerical sim- 
ulations, the numerical solution of Eq. ([3]), and the ap- 
proximate solution given by Eq.([5]), showing a very nice 
agreement. Interestingly, for 7 = 2.1, clustering remains 
nearly constant in the range of sizes 10 3 — 10 5 and even 
increases slightly for small sizes. This is a consequence 
of the slow decay of the term Ks combined with the 
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FIG. 3: Sample to sample fluctuations. The top plot shows 
the probability density function of the clustering coefficient 
as obtained from 10 4 network realizations for kmin — 2, k c = 
7V 1 /(7~i) ] 7 = 2.1, and N = 10 4 . The bottom plot shows the 
standard deviation of this pdf for different values of 7 as a 
function of the network size. Solid lines are power law fits of 
the form ac ~ N~ z . The exponent z is shown in the inset. 



diverging logarithmic term in the numerator and func- 
tions VK7) an d 0(7)1 which diverges in the limit 7 — > 2. 
In the inset of Fig. [T] we show the extrapolation of the 
clustering coefficient for sizes up to 10 s evaluated with 
Eq. (pj). In the case of 7 = 2.1, this figure makes evident 
the extremely slow decay -nearly absent- with the sys- 
tem size. This implies that, in practice, clustering cannot 
be removed from the network even in very large networks 
when 7 w 2. It is, thus, not clear whether the tree-like 
approximation, customarily used to solve problems on 
random graphs, can be applied in this case. In this sit- 
uation, one should use alternative approaches, like the 
one developed in [18]. These results are particularly rel- 
evant due to the abundance of real networks with values 
of 7 w 2. 

It is also interesting to study the behavior of clustering 
as a function of 7 for a fixed network size. Figure [2] 
shows this behavior for different values of N. For each 
size, there is an optimal value of 7 where clustering is 
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maximal. In the case of k m in = 2 and N > 10 3 , there is 
critical value ^crit ~ 2.1 below which clustering increases 
with the network size up to a maximum and then slowly 
decreases. 

Up to this point, we have been concerned only with the 
ensemble average of the clustering coefficient. However, 
the CM ensemble shows strong sample-to-sample fluctu- 
ations. Figure [3] shows the probability density function 
of the clustering coefficient obtained out of a sample of 
10 4 different networks generated by the canonical version 
of the CM. As it can be observed, clustering may take 
values in the range [0.05,0.25] quite easily. Figure [3] also 
shows the standard deviation oc as a function of network 
size and for different values of 7. In all cases, fluctuations 
decay as a power law of the system size, o~c ~ N~ z , with 
an exponent z < 1. Interestingly, for 7 = 2.1, the ex- 
ponent z takes a very small value [z ~ 0.1) that, when 
combined with the behavior of C as a function of N re- 
sults in a coefficient of variation nearly constant. This 
implies that, in this range of values of 7, clustering is de 
facto a size-independent but non self-averaging property. 
That is, a single network instance is not a good represen- 
tative of the ensemble even for very large network sizes. 

The presence of triangles in real networks play an im- 
portant role in many processes taking place on top of 
them, e. g. , percolation phenomena, epidemic spread- 
ing, synchronization, etc. It is, therefore, important to 



have full control over the most simple network ensembles 
that are used as null models to assess the presence of un- 
derlying principles shaping the topology of the system. 
In this paper, we have found the correct scaling behavior 
of the clustering coefficient of the ensemble of scale-free 
random graphs with 2 < 7 < 3. Interestingly, for values 
of the exponent 7 w 2, clustering remains nearly con- 
stant up to extremely large network sizes. However, in 
this case, clustering is not self-averaging. This means 
that when comparing real networks against the CM, it is 
not enough to generate a single instance network, as it 
may result in either a very low or high level of clustering 
even for very large network sizes. These results are par- 
ticularly important as the exponent value 7 ~ 2 seems 
to be -for yet unknown reasons- the rule rather than the 
exception in real systems. 
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